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I. VISHWANATH R. IYER, Ph.D., declare and state as 

follows : 

1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome-wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow in the laboratory of 
Patrick O. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells. My 
curriculum vitae is attached hereto as Exhibit A. 

2. Beginning in Dr. Brown's laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, I have used 
microarray-based gene expression analysis as a principal 
approach in much of my research. 

3. Representative publications describing this 
work include: 



scale," Science 278:680-686 (1997) ; * 9 enon *c 

identi^cation of" tar9et valid *tion and 

identification of secondary drug target efforts 

us^DNAmicroarrays.- Nature Med. lx2 S "V 301 

Sdence 283:83-87 (1999) and ' 

Nature Genetics 24: 227-235 <2cSu7 ' ' 
Two of the papers describe our use of microarray-based 
expression profiling to explore the metabolic reprogramming 
that occurs during major environmental changes, both in yeast 
(DeKisi et .1.. during the shift from fermentation to 
respiration) and in human cells ,l y er et al . . human 
fibroblasts exposed to serum, . One reference describes our 
use of expression profile analysis in drug target validation 
and identification of secondary drug effects (Marton et al ) 
And one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells {Ross et a2 . ) . 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 
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as we have practiced it relies for < - 

P**t eras =f expression. ^ °" «* 

5. For example. we have demonstrated a 
use the presence or absence o f a characteristic drug 

stature- pattern of altered 9 ene expression in drug-treated 
cells to explore the mechanism of drug action ana L I 
secondary effects th„* identify 
side effects A s a 1 ially deleterious dru g 

effects. As another example. „e have demonstrated that 
gene expression patterns can be used to classifv h 
cell lines, while it i, ««sify human tumor 

• , . C ° UrSe advan «9eous to know the 

Wo gi cal function of the encoded S ene products i„ order to 
reach a better understand^ 0 f the cellular mechanisms 
underling these results, these pattern-based analyseTdo not 
^Knowledge of the biological function of thl encoded 

«• The resolution of the patterns used in such 
comparisons is determined by the number of g enes detected the 
greater the number of ge nes detected, the hi g her th 
resolution of the pattern, xt goes without sayin g that hioh 

resolution patterns are S enerallv m „ * 9 

e g enerally more useful in such 

comparisons than lower resolution patterns. With such Mah 
resolutions co.es a correspond^!, hlgher degree ^ ^ 
statistical confidence for dist.ngu.sh.ng different patterns 
as well as identifying similar ones Patterns, 

v. Each gene included as a probe micr 

provides a signal that is specific to th. " 
ai- i _ P lc to the cognate transcrint 

at least to a f irst approximation * Each new „ 

-^j- tacn new gene-specific 

In a more nuanced view it s c ^>-^ • ^ 
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probe added to a microarray thus increases the number of genes 
detectable by the device, increasing the resolving power of 
the device. As I note above, higher resolution patterns are 
generally more useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the time I embarked on the production of whole 
genome arrays in early 1996. simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8- For example, our ability to subdivide cancers 
into disenable classes by expression profiling is lifted 
by the resolution of the patterns produced, with more genes 
contributing to the expression patterns. „e can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherwise indistinguishable cancers into a greater number of 
classes; the greater the number of classes, the greater the 
likelihood that the cancers classified together win respond 
similarly to therapeutic intervention, permitting better 
individualization of therapy and. „e hope, better treatment 
outcomes . 

9- If a gene does not change expression in an 

experiment, or if a o*»no -; e- _ 

a gene is not expressed and produces no 
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without discriminating amono them anrt f n » » 

of a variety of allelic vSLnt^f a slZlt o C ° Signal the P r «ence 

discriminating among them 9le 96ne ' again without 



signal in an experiment, that is not to say that the probe 
lacks usefulness on the array; it only means that an 
insufficient number of conditions have been sampled to 
identify expression changes. l„ f act , an experiment showing 
that a gene is not expressed or that its expression level does 
not change can be equally informative. To provide maximum 
versatility as a research tool, the microarray should 
include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe. 

10. I declare further that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United states Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon 
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Vishwanath R. Iyer 

Assistant Professor 

Section of Molecular Genetics and Microbiology 

Institute of Cellular and Molecular Biology 

MBB3.212A, University of Texas at Austin 

Austin, TX 78712-0159 

Phone: 512-232-7833 

Fax: 512-232-3432 

Email: vishy@mail.utexas.edu 

Education/Training 

Bombay University Mumbai, India B.Sc. (1987), Chemistry & Biochemistry 

M. S. University of Baroda, Baroda, India M.Sc. (1989), Biotechnology 

Harvard University, Cambridge MA Ph.D. (1996), Genetics 

Stanford University, Stanford CA Post-doctoral (1996-2000), Genomics 

Research Experience 

9/00-5/03 Assistant professor, Section of Molecular Genetics and 
Microbiology, University of Texas, Austin TX 

■ Global transcriptional control in yeast 

■ Gene expression programs during human cell proliferation 

■ Genome-wide transcription factor targets in yeast and human 
" Collaborative microarray facility 

5/96-8/00 Post-doctoral fellow Stanford University, Stanford CA 
(Advisor: Dr. Patrick 0. Brown) 

■ Yeast whole-genome ORF and intergenic microarrays 

■ Human cDNA microarrays for expression profiling 

9/89-4/96 Graduate student Harvard University, Cambridge MA 

(Advisor: Dr. Kevin Struhl) 

■ Yeast transcriptional regulation 



Honours and Awards 

Government of India Biotechnology Fellowship (1987-1989) 
University Grants Commission Junior Research Fellowship (1989) 
Stanford University/NHGRI Genome Training Grant (1996) 

Invited Conference talks (selected) 

Invited Lecturer, NEC-Princeton Lectures in Biophysics 

Princeton, NJ (June 1998) 
Plenary Session Speaker, HGM '99 (HUGO Human Genome Meeting) 

Brisbane, Australia (April 1999) 
Invited Speaker, Gordon Research Conference "Human Molecular Genetics" 

Newport, RI (August 2001) 



^Tffl^Sf "Oncogenomics 2002" Conference 

'^n Sym ™' Un^ity of Michigan, 

I ^S£S2?f^? D ! S Bl u°f : ^ 0mic A PP roacl *s to Transcriptional 
Regulation Cold Spring Harbor Laboratory Meeting (March 200O 

Symposium co-Chair and Speaker "Functional Genomic" American Society for 
Biochemistry and Molecular Biology Meeting, San Diego, CA (AprU 2003 

Invited Speaker in Functional Genomics (Gene Networks) SympodSm International 
Congress of Genetics, Melbourne Australia July 6-11 2003 lnternatao ^ 

Invited Speaker BioArrays Europe 2003" 
Cambridge, UK (Sep/Oct 2003) 

Departmental Seminars 

^SctoW SSS* GCnetiCS Bi ° Chemistty & Bi °P h y sics Apartments, 

New York University School of Medicine, Department of Biochemistry 
November 20 2002 "*»ujr, 

UT Southwestern Medical Center, Human Genetics Seminar Series 
May 5 2002 ' 

UCLA School of Medicine, Department of Human Genetics 
June 2 2003 

National Human Genome Research Institute 
June 12 2003 

Sanger Institute of the Wellcome Trust, Hinxton UK 
Sep 2003 ' 

Other Professional Activities 

^oT) 1 " Gen ° me Bi ° l0gy ' Gen ° me Research > Nature ^netics, Science (1998- 

^o^ot?*™* Harb ° r SUmmer C ° UrSe " Making and usin * DNA Microarrays" 
Member, NIDDK Special Emphasis Review Panel ZDKi (2001-2002) 

Publications 

1. ly^rV & Stnihl, K. (1995) Poly(dA:dT), a ubiquitous promoter element that 
stimulates transcription via its intrinsic DNA structure, EMBO J. 14: 2570-2579. 

2 ' ^T^t & , Struh1 ' ^ (1 , 995) Mech a™sm of differential utilization of the his<* TR and TC 
TATA elements, Mol Cell Biol 15: 7059-7066. 3 

3. lygrJL & Struhl K. (1996) Absolute mRNA levels and transcription initiation rates in 
Saccharomyces cereviswe. Proc. Natl Acad. Sci . (USA) 93:5208-5212 



4. DeRisi J. L., IverV.R f ft Brown P. 0. (1997) Exploring the metabolic and genetic 
control of gene expression on a genomic scale. Science 278:680-686 

5. Marton M. J., DeRisi J. L., Bennett H. A., IverV. R.. Meyer M R. Roberts C T 

P ^ftFriend B S U H Ch r d ft ?* ^ ' M P*" * * Jr ^IL H ^Brown 
P. Oft Friend S^H. (1998 1 Drug target validation and identification of secondary 
drug target effects using DNA microarrays. Nature Med. 4:1293-1301 

6 ' I 3S^hffifrt? tt De ^ J' J*™ M " J " Br0WD R a & Johnston M - (1998) 
Characterization of three related glucose repressors and genes they regulate in 

Saccharomyces cerewsiae. Genetics 150:1377-1391 

7. Spellman P. T., Sherlock G., Zhang M. Q., IverV. R.. Anders K Eisen M B Rrnwn p 

0., Botstein D. & Futcher B. (1998) Compt^r^e identification 0 ?celi cVcfe 
MotTol. Crtgwllw Saccharom y ces cer ~ ^ microarray hybridization. 

8. IyerV.R., Eisen M. B., Ross D. T., Schuler G., Moore T., Lee J C F Trent T M 
Staudt L. M. Hudson Jr. J., Boguski M. S., Lashkari D s£&£S££l 

(1 " 9) ^ e . transcrf Pti«»l Program in the response of human 
fibroblasts to serum. Science 283:83-87 

9 ' L & lYerV R " (1999) Genomics a "d array technology. Curr. Opm. Oncol 

10. Ross D. T Scherf U., Eisen M. B., Perou C. M., Spellman P., IverV. R. Rees C 
Jeffrey S. S., Van de Rijn M. f Waltham M., Pergamenschikov ATLeejTc ' 
Ji^S c ;' *° n D V ^ T - G -' Wein *tein J. N., Botstein D., & Brown P O 
(2000) Systematic variation in gene expression patterns in human canceTcell Hnes 
Nature Genetics 24: 227-235 

11. Sudarsanam P IyerV.R., Brown P. O. & Winston F. (2000) Whole-genome 

97 3T64°-33 a 69 mUtantS ° f S ' cerevisiae - Proc. Natl. Acad. Sci .(USA) 

12 ' 'S^^S^i, Jm ^ a2L£u & J ° hnSOn A> D - (2000) The chromo d0 ^ain 
Stoj^Iw^I ^ 18 ATP " de P endent chromatin-modifying factor 

13. Gross C, KelleherM. JverV R, Brown P. 0., & Winge D. R.. (2000) Identification 
of the copper regulon in Saccharomyces cerevisiae by DNA microarrays J Biol 
Chem. 275: 32310-32316 * 

14. Reid J. L IyerV.R. , Brown P. O. & Struhl K. (2000) Coordinate regulation of yeast 
nbosomal protein genes is associated with targeted recruitment of Esai histone 
acetylase. Mol. Cell 6: 1297-1307 



15. teL^Aj .Hcrak C Scafe C. S., Botstein D., Snyder M. & Brown P. O (2001) 
N^TwtS ^ ""'^ transcri P tion fa «°* SBF and MBF 

16. Mild R. Kadota K, Bono H. Mizuno Y., Tomaru Y., Carninci P., Itoh M Shibata K. 

£ vT^i! - W f *? 3be S - J Sat ° K " Tokusumi Y > Kk «ch N., Ishi'i Y ' 
Hama^chi Y Nishizukal., Goto H., Nitanda H., Satomi S., Yoshild A Kusakabe 
M DeRisi J.L Eisen M.B., IyerV.R, Brown P.O., Muramateu M., Shimada H 
Okazala Y. & Hayashizaki Y. (2001) Delineating developmental and mSic'' 
pathways m vivo by expression profiling using the RIKEN set of 1 fi «7a £11 „*. 
enriched mouse cDNA arrays iVoc. Nad. AcacL Sci^) $££2^ ^ 

18. IyerV.R. Microarray-based detection of DNA protein interactions: Chromatin 
Immunoprecipitauon on Microarrays, in DNA Microarrays: A Molecula^aonina 

"oot: ' & Sambrook ' JJ 453 " 463 (Cold Sprin * 

*(not peer reviewed) 

19. Killion, P., Sherlock G. and IyerV. R. ( 20 o 3 ) The Longhorn Array Database an 
open-source implementation of the Stanford Microarfay Database BMC 
Biomformatics 4: 32 ^^u. 

20. Hahn J. S., Hu Z., Thiele D. J. & IyerV.R. Genome-Wide Analvsis of th* iw„i t 
Stress Responses Through Heat stc^nsc^^^^ 

21. Kim J. & lyerVJL The global role of TBP recruitment to promoters in mediatine 
gene expression profiles (manuscript in preparation) mediating 



Current/Pending Research Support 

U01 AA13518-01 Adron Harris (PI) 25% effort 

9/28/01 - 9/27/06 

NIH/NIAAA 

"INIA: Microarray Core" 

Sj? A P ^Tir n^ eSP ^ t0 th e Integrative Neuroscience Initiative on Alcoholism 
£ Sftf £ ■ ° 2 - THe OVera " g ° al is t0 su PP° rt the use * microarray JechnoW 
consf m P tion n8eS " ^ eXPreSSi ° n ^ predict ° r 
Role: Co-investigator 



003658-0223-2001 Iyer (PI) 16% effort 
01/01/02-08/31/04 

Texas Higher Education Coordinating Board (ARP) 

^croarray based global mapping of DNA-protein interactions at promoters in human 

Promote* * Pr0jeCt t0 ^ ^ ^ interactions of transcription factors with human 
Role: PI 



Information Technology Research 0325116 R. Mooney (PI) 0% effort 
09/01/03 - 08/31/07 J J * 1 

NSF 

iScove'ry'' 6 " 0111 Multi " SoUrce Data Minin * to Experimentation for Gene Network 
Role: Co-investigator 



1 R01 CA95548-01A2 (pending) Iyer (PI) 25% effort 

12/1/03 - 11/30/08 

NIH 

"Analysis of genome-wide transcriptional control in yeast" 

'.ffiSST" TeSP °™ W < tra »° n fa «<* ™ yeast through 

Role: PI 



Breast Cancer Idea Award (pending) Iyer (PI) 10% effort 
1/1/04 - 12/31/06 

US Army Medical Research and Materiel Command 

"Genome-wide chromosomal targets of oncogenic transcription factors" 

This is a project aimed at identifying direct chromosomal targets of c-myc and ER in 

human cells through the use of a novel sequence tag analysis method 

003658-0531-2003 (pending) Marcotte (PI) 8% effort 
01/01/04 - 12/31/05 

Texas Higher Education Coordinating Board (ATP) 

genomSe» DOVel hish - throu S h P ut P Iatf °™ measuring gene taction on a 

?f p , r °?° s ^ » at Roping a novel microarray based platform for automated 
o^toctioT m ' CrOSCOp,C lowing rapid and systematic evil 
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Ftscher-Vfce, Science 270. 1828 (1 995). 
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87. 75 H996); 0. a Stokes. K. 0. TartoT. R P. Pterry, 
^TOC Ateft ytaad Set USA 93. 7137 (1996). 

36. P. M. Palosaari etaL,J.BbL Chem 266. 10750 
-(1991); A. Schmitz. K. K Gartemam, j. Beefier, E 



Grund. R Bcftertaub. Appl. Errnvn Moot** 58. 
4068 (1992); V. Sharma, K. Suvama. R Mega- 
nathan. M. E. Hudspeth, j. Bacterid 174. 5067 

(1992) ; M. Kanazawa et aL. Enzyme Protein 47. 9 

(1993) ; 2. L Boynton, G. N. Bennet. F. 8. Rudotoh 
J. Bacterid 178.3015(1996). ™h- • 

37. MHoefai. Caff 77. 869(1994). 

38. W.Hen^e^a/..J.CeffBortem.59.4iB(i995) 

39. We thank K Skatetsky and F. Lewitter for help with 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry irt a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration ^£53S 
profiles bserved for genes with known metabolic functions pointed to features of the 
metabolic reprogramm.ng that occur during the diauxic shift, and the expression patterns 
^ The 

11 h!iS? m ;^°T ayS We ? a,S ? US6d t0 identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrio- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (I, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 
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pbrown©OTio/n.starttord.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



680 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3 (green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-foid (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 



SCIENCE • VOL 278 • 24 OCTOBER 1997 • www.sciencemag.org 



\ 



to any gene whose function is known ( IS). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
si n f genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACSJ), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCK1, encoding 
phosphoenolpyruvate carboxykinase, and 
FBP1, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
coses-phosphate . Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 
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Reports 



the last timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). AH of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACR1 and JDP2, revealed that 
ACRl t a gene essential for ACSJ activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are\ 
illustrated in Fig. 5, C through F. Th<? X 
sequences upstream of the named genes in 
Fig- 5C all contain stress response ele- 
ments (STRE), and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm The 
scann.ng confocal m.croscope used to coltect all the data we report (49). a fluorescentJy labeled 

£ nl Pr °^r S ?ne Pared ^ mRNA iS0iat6d ,rom CeIls ***** *o<£2 $££ 

density of <5 x JO^cete/ml and media glucose level of 19 g/Mer) by reverse transcription^ me 

iTo ^Z* CU tUre 9 5 nours ,ater < cutture density of ~2 x 10° cells/ml, with a glucose level of 
^ 2 j^ * ™ er !f transcription in the presence of CyS-dUTR m this image, hybridization of the 

T'^t^ C ? NA ( Tn expreSSi0n at the initial timepoint > is presented as a g een 

S ! ?1 hybnd,2 f on of Cy5-dUTP-labeled cDNA (that is. mRNA express™ at 9.5 houTsHs 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear In this 
mage as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots « una aner 



www.sciencemag.org • SCIENCE • VOL. 278 • 24 OCTOBER 1997 



681 



of HSP42, have prcvi usly been shown t 
be controlled at least in pan by these 
elements (21-24). Inspection of the se- 
quences upstream f HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3 t 4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS-,-) 
that is recognized by the Rapl DNA-bind- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP1 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and S1P4, were induced by a factor of 
more than threefold at the diauxic shift. 
S1P4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of SIP4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the tuplA mu- 
tation and YAPl overexpres- 
sion, red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
die genomewide changes in gene expression 
that result from deleci n of the TUP] gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-rcpressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Wild-type yeast cells and cells bearing 
a deieti n of the TUP] gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively (/ ]). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup J A 
strain, and thus presumably repressed by 
Tupl (41 ). A representative section of the 
microarTay (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the rupJA mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP], suggesting that these genes may be 
subject to TUPJ-mediated repression by 
glucose. For example, SUC2 t the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUP]. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-specific genes MFA] and 
MFA2, and the DNA damage-inducible 
RHR2 and RNR4, as well as genes involved 
in flocculacion and many genes of unknown 
function. The hybridization signal cone- 
sponding to expression of TUP] itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tup] A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of ail yeast genes 
appeared to be TUP I -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUP] 
was deleted. Another group of related 
genes that appeared to be subject to TUP] 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
senne-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tup]A 
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strain, and 18 of these genes were induced 
by m re than sevenfold when TUP I was 
deleted. In contrast, n n of 83 genes that 
could be classified as putative regulat rs f 
the cell division cycle w re induced more 
than twofold by deletion of TUPL Thus, 
despite the diversity of the regulatory sys-' 
terns that employ Tupl, most of the genes 
that it regulates under these conditions 
fall int a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach t identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAI 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupl A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MAT A 
strain (in which expression of MFAI and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAPl en- 
codes a DNA-binding transcription factor 
belonging to the b-*ip class of DNA-bind- 
ing proteins. Overexpression of YAP] in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAPl 
under the control of the strong GAL1-10 
promoter, both grown in galactose (that is 
a condition that induces YAPl overexpres-' 
sion). Complementary DNA from the con- 
trol and YAP] overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPL 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 
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YAPl 'was verexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
tour of the genes in this set also belong t 
£c general c ass of dehydrogenase^, 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 

solated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47) 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
ox,doreductases suggests that these genes 



Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average in- 
duction a repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ribosomaJ 
proteins, 112; translation 
elongation and initiation 
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ins sites upstream of th others may reflect 
an ability of Yapl t bind sites that diner 
from the canonical binding sites, perhaps in 
cooperation with ther fact is, or less like- 
ly, may represent an indirect effect of Yapl 
verexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
iie the transcriptional consequences of 
mutati ns affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ixation of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor r modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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required for fabricating and using DNA 
microarray, (9) consists of component, 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
1 l aC ^ li4h amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
s.on in d.verse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effects based on genome-wide gene expression patterns. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calcineurin, inv 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug 'signature* pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in 'targetless' cells, 
showing that FK506 affects additional pathways independent of calcineurin and the inv 
munophilins. The described method permits the direct confirmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 



Good drugs are potent and specific: that is. they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microanray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale 1 ' 1 '. DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevisiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression'' 10 . 

In the modern drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome- wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression that result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or signature*, of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506. a clinically used immunosuppressant, 
has off-targef effects that are independent of its binding to inv 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling' 2 13 . In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis' 4 , for adaptation to pro- 
longed mating pheromone treatment' 1 and in the regulation f 
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Fig.1 Model of antagonism of the cafcineurin signaling pathway mediated 
by FK506 and cyclosporin A (CsA). Cafcineurin activity is composed of a cat- 
alytic subuntt (cafcineurin A, encoded in yeast by the CNA 1 and CNA2 genes), 
and calciunvbinding regulatory subunits calmodulin (CMD) and cafcineurin B 
(CnB). After entering ceOs. FK506 and CsA specifically bind and inhibit the 
peptidy^iine isomerase activity of their respective immunophilins. FK506 
binding proteins (FKBP) and cyctophiiins (CyP). The most abundant inv 
rnunophains in yeast (Fori and Cph1) are thought to mediate calcineurin in- 
hibition. Drug-4mmunophflin complexes bind and inhibit the calcium- and 
calmodulirvstimulated phosphatase cafcineurin. Among the substrates of cal- 
cineurin are transcriptional activators that act to modulate gene expression 



the onset of mitosis". In mammals, calcineurin has been impli- 
cated in T-cell activation", in apoptosis 17 . in cardiac hypertro- 
phy ,§ and in the transition from short-term to long-term 
memory'*. In both organisms, calcineurin activity is inhibited 
by FK506 and CsA, immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins 1 " 0 (Fig. l). To assess the ef- 
fects of pharmacologic inhibition of calcineurin, wild-type S. 
= cerevisiae was grown to early logarithmic phase in the presence 
j or absence of FK506 or CsA. Isogenic cells, from which the 
i genes encoding the catalytic subunits of calcineurin {CNA 2 and 
i CNA % had °een deleted* 1 (referred to as the cna or calcineurin 
f mutant), were grown in parallel, in the absence of the drug. 
I Fluorescently-iabeled cDNA was prepared by reverse transcrip- 
! tion of polyA* RNA in the presence of Cy3- or Cy5-deoxynu- 
cleotide triphosphates and then hybridized to a microarray 
>■ containing more than 6,000 DNA probes representing 97% of 
the known or predicted ORFs in the yeast genome 
Simultaneous hybridization of Cy5-labeled cDNA from mock- 
treated cells and Cy3-labeled cDNA from cells treated with 1 
ug/ml FK506 allowed the effect of drug treatment on mRNA lev- 
els of each ORF to be determined (Fig. 2a and b and data not 
shown). Similarly, effects of the calcineurin mutations on the 
mRNA levels of each gene were assessed by simultaneous hy- 
bridization of Cy5-labeled cDNA from wild-type cells and Cy3- 
labeled cDNA from the calcineurin mutant strain (Fig. 2c). For 
each comparison of this kind, reported expression ratios are the 
average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporation of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the calcineurin mutant strain 
was compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the calcineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two perturbations (Fig. 26-d). Quantification of 
this similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the calcineurin deletion signature (p « 0.75 ± 0.03), as well as 
the CsA treatment signature (p - 0.94±0.02). but not with a 
randomly selected deletion mutant strain (deleted for the 
YER071C gene; p - .0.07 ± 0.04; Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 




unrelated pathways, and none had statistically significant cor- 
relations. These data establish that genetic disruption of cal- 
cineurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azoie (3-AT) with the effects of deletion of the H1S3 gene HIS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 

v^ l ^AT Venth StCP ° f thC hiStidine biosvn *etic P«hway in 
yeast . 3-AT is a competitive inhibitor of this enzyme that trig, 
gers a large transcriptional amino-add starvation response" 
Microarray analysis of wild-type and isogenic /^deficient 
strains demonstrated the expected large genome-wide transcrip- 
tional responses (involving more than 1,000 ORFs) resulting 
from treatment with 3-AT (Fig. 3a) or from HIS3 deletion (Fig 
3c)^ Quantitative comparison of the 3-AT treatment signature 
and the his3 mutant signature showed a high level of correlation 
(P= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 3b). As a negative control 
the correlations between the 3-AT treatment signature or the 
hjs3 mutant signature and the calcineurin mutant strain were 
not statistically significant (p - 0.09 ± 0.06 and -0.01 ± 0 04 re- 
spectively). That both the calcineurin/FK506 and the Ws3/3-AT 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
inhibitor of that gene's product. 
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Decoder' strategy: Drug target validation with deletion mutants 
Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
or drug sjgnatures to mutant signatures is unlikely to unambigu- 
ously identify a drug s target. To overcome this limitation, an 
additional decoder' step is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coeffiuent metric. Mutant strains whose expression profile is 
similar to that of drug-treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is. the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from the drug signature 
seen in wild-type cells. * 

NATURE MEDICINE . VOLUME 4 • NUMBER 11 • NOVEMBER 1998 



AS 1998 Nature America Inc. . httpy/medicine.nature.com 



ARTICLES 



E 

8 

e" 

3 



03 
C 

0) 

I 
a 



Fig. 2 Expression profiles from 
FK506- treated wild-type (wt) 
cells and a calcineurirvdisrupUon 
mutant strain share a genome- 
wide correlation. ONA microarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment (a and b) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). «. Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
labeled cONA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain R563 treated with 1 jig/ml FK506. 
b. Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed, a, Pseudo- 
color image of the results of simultaneous hybridization 
of Cy5-labeled cDNA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNAICNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d. 
The log™ of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the log w of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively. «. The log l0 of the expression ratio for each ORF de- 
rived from the FK506 treatment hybridizations is plotted versus the log, 0 
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wt vs. caJanenirin muum 
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log* (R/G) calcineurin mutation 



Log* (R/G) yer07lc mutation 



of the expression ratio in the yer07lc mutant hybridizations, 
were induced or repressed in both experiments. 



No ORFs 



To illustrate this, we treated the his3 mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in hls3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c). as well as the his3 mu- 
tant strain signature (Fig. 4b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1 .000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the h/s3-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast 12 {CPHl and FPRl) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 



Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 



wild-type 
♦/-FK506 



cna 
♦/-FK506 



wild-type 
+/- FK506 



fprl 
4/-FK506 



cna fprl 
+/-FK506 



0.93 t 0.04 -0.01 i 0,07 -0,23 t 0.07 0.1 2 i 0.07 0,79 t 0 .03 

flTr^^^ 0 " S ^ , *° C * erKe of lhc FK506 ^ure specifically in the calcineurin (cna) and fprl 
(major FK506 b.ndtng proiein) deletion mutants, cna represents the mutant with deletions of the catalytic sub 
uniu of cateineunn. CNA1 and CNA2. The correct K>n coefficient repoaed in the first column *Z*n" ^ cor! 
reiwn between two pairs of hybridisations from independent wild-type */- FKS06 experiment 
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of altered gene expression resulting from drug treatment of the 
mutant cells (that is. mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
« 0.75 ± 0.03). it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p - -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fprl mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p - -0.23 ± 0.07), indi- 
cating that the FPRl gene product is likely to be involved in the 
pathway affected by FK506. The same was true for the cna fprl 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p . 0.79 ± 0.03) 
indicating the cphl mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FK506-treated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 

- FK506 responses in wild-type and the cphl 

cphl mutant strain are similar, and there are no 

♦/-FK506 transcript-level changes (seen in black) for 
treatment of the calcineurin. fprl and cna 
fprl mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 
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Fig. 3 Expression profiles 
from a hh3 mutant strain 
and wild-type (wt) cells 
treated with 3-AT share a 
genome-wide correlation. 
ONA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT treatment is) or from ge- 
netic disruption of the HIS3 
gene (c). m. Pseudo-color 
image of the results of simul- 
taneous hybridization of 

CyS-labeled cDNA (red) from mock-treated wild-type strain R491 and 
Cy3-labeled cDNA (green} from strain R491 treated with 10 mM 3-AT. 

Plot of the log, e of the expression ratio for each ORF derived from the 
3-AT treatment hybridizations is plotted versus the log w of the expression 
ratio in the his3 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments 8re shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example, CHA1 and ARCl). but also extends to 
genes with expression ratios less than 2 (for example. tlVI and CAW 7). 
IL VI is induced 1 .9-fold and 1 .5-fold, and CP HI is downregutated 1 ,9.fold 




AJtCJ 



wt vs. MJ mutation 





log* (R/G) n«3 mutation 



and 1 .7-fold, in cells treated with 3-AT and his3 mutant cells, respectively 
Two ORFs do not fall on the line x * y. The leftmost point is the HtS3 data 
po.nt. which is induced by 3-AT treatment but which is not absent from 
the his3 mutant strain. The other point is YOR203w. Both data points are 
labeled HiS3 because hybridization to YOR203 wis most likely due to HIS3 
mRNA. as Y0R203w overlaps the HIS3 open reading frame, a. Pseudo- 
color image of the results of simultaneous hybridization of Cy5-labeted 
cDNA (red) from wild-type strain R491 and Cy3-labeled cDNA (green) 
from strain R1226, deleted for the H1S3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 



c 
6 
c 

0 



CL 



I 

0 

E 
< 



CD 



lated to the profile elicited by mutation of the calcineurin genes 
(p « 0.71 4 0.04). but did not correlate with the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p - -0.05 ± 0.07; Table 2). indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-treated cphl mutant cells, 
and the expression profile of CsA-treated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p » 0.18 ± 
0.07). Thus, the CPH1 gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fprl 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p « 0.77 ± 
0.03). indicating that FPR1 was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPH1 and the genes encoding calcineurin. but not 



FPRl. are necessary for the wild-type CsA response {Fig. 56). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fprl mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506. it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl . but not Cphl . as a potential FK506 drug 
target. In the same way. the decoder' strategy was necessary to 
identify Cphl. but not Fprl, as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. 'Decoding' such a complex signature 
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fig. 4 Treatment of the n/sJ mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log„ of the mean intensity of hy- 
bridization for each ORF versus the log t0 of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red. respectively. «. Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. Cy5-labeied 
cDNA (red) from mock-treated strain R491 and Cy3-labeled cDNA 
(green) from strain R491 treated with 10 mM 3-AT. b. Expression profile 



Log* (intensity) 



from the his3 deletion strain. Cy5-labeled cDNA (red) from strain R491 
and Cy3-labeled cDNA (green) from strain R1226. deleted for the HIS3 
gene, e. Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from h/s3-deleted strain R1226 and Cy5-ia- 
beled cDNA (green) from strain R1226 treated with 10 mM 3-AT 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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nto the effects mediated through the intended target (the on- 

*ZV h 8 " a r e 1 thOSC media,ed thrOU « h tar- 
gets (the off-targef signature) might be useful in evaluating a 
compound's specificity. Our decoder" strategy is based on ,h! 
premise that off-targef signature should belns n s^o L 
genetic disrupUon of the primary target ,nSenSit,ve ,0 the 
To determine whether the decoder" approach could identify 
an off-targef profile, we looked for a drue-resoomivp ? * 
whose expression , insensitive to deletionTf 2T Z/r/ Z 
get. To mcrease the likelihood of observing such gene7 he 
»me strains described in Tables , and 2 were treated i 
higher concentrations (50 ug/mj, of FK50 6. This led to a much 

tha ^Z^™™ Pr ° n,e in »*• 
that at thfc higher concentration. FK506 was Inhibiting or acti 

S :TT l tar66tS - SCVeral ° f ,hC ° RFS ta this expand d ' 
FK506-induced expression profile were not affected by the cal 
cineurin. cpb, or fprJ mutations, as drug treatment of th se mu 
tant strains did not block their presence in the F K S) 6 

IZZ T Sl8natUre ffig - ^ indiM,eS *« FK506 wa S tS 
gering changes in transcript levels of manv een« rhm,.„i, u 

ways independent of calcineurin. CJW; ZtPpTZfout 

oon^t ° RFS , in ^ '° ff - tar8et ' ** h ™y ^re gTn: re C 
ported to be regulated by the transcriptional activafor Scn 4 
(ref. 24). In some strains, a reporter gene under CCN4 control 
was induced in response to FK506 treatment" To *J?Z 
whether CCN< is involved in this pathwa^th t is'ndep ndem 
U of calcineurin. Ctt/7 and /Wj. we analyzed the efwTr , 
« ment with high-dose FK506 on glo T g 2 exp^sion T' 
strain with a GCN4 deletion (Fig.!). Of the 1 OR^th cal" 
aneunn-independent expression ratios greater than 4 " wet 

Dy fK506 was GCM-dependent. Not all CCAK-renulated pen., 
were induced by FK506. This FKS06,nduced sZZ £™ 

ZTcZ fr? ™ y "V"* m0St Semi,iVe 10 subtle ^Cs 
in Gcn4 levels, or perhaps other regulatory circuits prevent 

FK506 activation of some CCM-regufated gTnes. Seven of the 
remaining nine ORFs induced by FKS O^jnde^;^ 

Fig.S Response of FK506 and CsA signature genes in «™im * , 
in different genes. Genes with express^ 'HiJgZ«Z™TJ*T 
response to treatment wfth 1 ug/ml FK506 (a) or Z To/m ?r,i ET 
<^**>*nd their expression^ 

green (induction)-red (repression) color scale. . CiEuE £T * 
and FK506 treatment signature genes are in tne kiESEiT?, mU ' am 
FK506 signature geneThave expression <B^* ZZnZ2, a " 
involved in pathways affected by FK506 i^^l^^T 
tarns) but not in deletion strains in unrelated pathways C?,^ 
(cna) mutant and CsA treatment signature^ S Tin t£ 
coiumre. Almost all CsA signature genes rave exSon ral nl 
deletion strains invtfved in pathways affected by cT(«. %£Z*™ y " 
cna , mutanu, but no, in deietion strains In Lt^Z^T 
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both the calcineurin and CCN4 pathways. The 
simplest explanation is that FK506 inhibits r 
activates additional pathways. Members of this 
class include SNQ2 and PD R 5 . genes that ™ 
code drug efflux pumps with structural homol- 
ogy to mammalian multiple drug resistance 
proteins". FK506 may interact dfrecUy w™ 
Pdr5 to .nhJbit its function". Our results indi- 
cate that treatment with FK506 leads to four- 
fold-to-sixfold induction of PDRS mRNA levels. 
YOR1. another gene that can confer drug resis- 

FK506 Th„< h UnCe ' * 3lS0 indUCCd th ^oId-to-fourfold by 
FK506. Thus, drug treatment of strains with mutations in the 

5^32^ Pr ° Ve USCfUl in effect mediaS 

by secondary drug targets, including the nature and extent of 

ass; art sr," 
«£» - ^xrrs-^ 
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f no Parity «° the wJJd-type drug expression profile In con- 

. trast. drug-mediated signatures from strains with mutations in 

o genes involved in pathways unrelated to the drugs action^ 

5 showed extensive similarity to the wild-type drug signature Bv 

^ KS 8 th ' S 8PPr t1 Ch V drUg ^ affccts muiti P le P«hwa>; 
(FKS06). we were able to decode a complex signature into com- 
ponent pans, including the identification of an "ofT-tarsef sip 
nature that was mediated through pathways independent of 
calcineurin or the Fprl immunophilin. 

Discussion 

It is well-established that high-throughput biochemical screen- 
ing can identify potent inhibitory compounds against a given 
target. The "decoder" approach described here complement" 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathway 
other than that of its intended target. The ability to observe 
such off-targef effects will likely be useful in several ways 
Profiljng compounds with known toxicities will allow the de 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
off-target signatures of otherwise promising compounds then 
may allow earlier identification of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of off-tarsef sie- 
natures of promising drug candiates could provide a new wav 
to group compounds by their effects on secondary pathways 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra~). for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
d.esterase 5. an isozyme whose inhibition overcomes defects in 
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f»« Response of FK506 ^Ureger^strairBw^oeJeOo™ 
m different genes. Gene with expression ™« grelL^T^ 
of < ,n at least one experiment are feed and iJ^^nSZ 
the indited strain are shown in the green (WucuTXS £££ 
«on) coior see. The genes have bZShfSZE^ 
spend,ng to these expected behaviors: -CWWependenT c«£ 
respond to FK506 (50 pg/m.) except when eto*ZZto^J£l 
m>1 or both are deteted: -GCA/4-dependenf genes res^to^SS 
except when GCNA _ oeteted. The* genes**,, re^toS! 
when Ccineurin genes or fPRi or CW^iete^J^ 
sponses are not mediated by calcineurin. CpM. or Fprl CNA- Jnri 

tested. A complex behavior- dass is provided for those oenes that did 
no, match the mode, of FK506 response JSSmXS 2 
cineurm or Fprl or separately through Gcn4. 9 681 

penile erection. It is possible that application of the 'de- 
olm I °r 6 ! COmpounds ma y «h°w that they too have a 
Sedt"^ a8ai " St > *° m ^ 

T M e Kn bility /f deC ° de dru 8 effects is ^pendent on the 
availabilityof functionally targetle*" celK In yeast th ,s 

ene'T „ ^ * V""™*** ***** each yea 
gene (Saccharomyces Deletion Consortium: http 7/se- 

quence-www.stanford.edu/group/yeast.deleUon_pro- 
ject/deletion.html). Efforts are underway w obtain 
expression profiles from each deletion mutant «ra!n 
Determining signatures resulting from inactivation of es- 

DossC^f " k PfeSentS 8 U " iqUe Problem - but 11 ™y »e 
I t y eXamining hete ^y«otes or by using a ion- 

trouble promoter to reduce expression of the essential gene 
Ahhough it is already feasible to test several compounS! in 
dozens of yeast strains, another challenge for the "decoder" 
strategy w„, be the efficient selection of tht mutants w th de e- 
tions n genes most likely to encode the intended drug targe! 
ll S T a,Ur ,? COrre,ation P' ots Ascribed are one mefricT, 
could be used as par, of that selection process, but othe« need 

en* r d ?v ed - i^'i?" 8 the dCCOdcr ' 10 -ammalian ee ?U pr e 
sents additional challenges. It is considerably more difficult to 
isolate functionally targetless" cells. Strategies involving ti a, 
able promoters, known specific inhibitors, anti-sense SnAs r - 
bozymes. and methods of targeting specific protemt for 
degradation are possible and should be tested. Another Li a 

Trl T 3 " Ce " tyPM 6XpreSS lhe $a ™ *< of genes and 
herefore off-targef effects may be different in different ceH 
types. In addition, applying the decoder" to human celk w! 
a so require technical improvements that allow expression p "o 

of wl,T a K Sma " " Umber ° f Ce " S - Even the broa ' er quesZ 
of whether the insensitivity of off-targef signatures to the dfc 
rup ion of the main target is the exception or the rule can only 
b an wered by the accumulation of more data. Barkai and 
Le bier however, have argued in favor of robustness of bioloi 

si", , ' 'T CaUng ,h3t dfU 6 Perturbations (off-targef 
signatures) may be robust even when the system is subjected 
another perturbation (such as a genetic disruption)(ref 28) 
Many practical developments will be necessary ff the decod! f 
concept is to be broadly applied. oecooer 
Expression arrays have been used mainly as an initial screen 
o genes induced in a particular tissue or process of interest oy 
fo using on genes with large expression ratios. We UaZ 
found, however, that effort to refine experimental protocol! 
and repeat experiments increases the reliability of the data and 
permits new applications. For examp.e. it provides a larger s« 
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Table 3 Yeast strains used 



Strain 

YPH499 

R563 

R558 

R567 

MCY300 

R132 

R133 

R559 

BY4719 

BY4738 

R491 

BY4728 

BY4729 

R1226 



Relevant genotype 

Mata ura3*S2lyi2-80l a*te2-707 trphA63 hi$3-A200 )ev2-Al 

Mata ura3-S2 tys2~801 ade2*101 trp1-A63 his3-A200 ieu2-A 1 his3::HfS3 

Mata ura3*$2lys2-801 ade2-10l trpUA63his3-A200ieu2*A1 fpr1::HlS3 

Mata ura3'52ty52-801 ade2-101 trp1-A63 his3-A200leu2'A1 cpM::HtS3 

Mata ura3'S2iys2-801 ade2-101 trphA63ho3-A200leu2-A1 cnaU1::hisGena2A1::H!S3 

Mata ura3'52tys2.801 ade2-101 trphA63his3-A200teu2.A1 cnaW::hisGcna2A1::HtS3cphVkarf 

Mata ura3-$2iy&801 adeMOl trpUA63 his3-A200 ieu2^1 cnaUl::hisGcna2A1::HtS3fprVkarf 

Mata ura3-S2iys2-801 ade2-101 trp1-A63 his3-A200leu2*l hts3::HIS3gcn4~LEU2 

Mata trphA63 ura3-A0 

Mata trpl-A 63 ura3-A0 

MaW/a BY471 9 XBY4738 

Mata his3-A200 trp1-A63 ura3-A0 

Mata his3-A 200 trp1-A63 ura3-A0 

Mata/a BY4728 XBY4729 



Reference 
(34) 

(this study) 
(this study) 
(this study) 
(21) 

(this study) 
(this study) 
(this study) 
(35) 
(35) 

(this study) 

(35) 

(35) 

(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 
it allows subtle signatures to be detected, when, for example, a 
protein is only partially inhibited. This may enable clinical 
monitoring of small changes in protein function in disease or 
toxicity states before they could otherwise be detected. 
Because the functions of many genes detected on transcript ar- 
rays are known, these microarrays are powerful tools that pro- 
vide detailed information about a cell's physiology. For 
example, changes in the flux through a metabolic pathway are 
reflected in transcriptional changes in genes in the pathway 7 . 
Furthermore, it may be possible to indirectly measure protein 
activity levels from expression profiling data (S.F.. et a/., un- 
published data). Thus, although the eventual development of 
genomic methods allowing the direct measurement of all cel- 
lular protein levels will be an important achievement, tran- 
script array technology offers an immediate and robust means 
of evaluating the effects of various treatments on gene expres- 
sion and protein function. 

Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques 1 *. 
To construct strain R559, strain R563 was transformed to Leu' with plas- 
mid pM12 digested by Sa/I and MliA (provided by A. Hinnebusch and T. 
Oever). Strains R132 and R133 were constructed by transforming the bac- 
terial kanamycin resistance cassette 10 flanked by genomic DNA from the 
CPH1 and FPR1 loci, respectively, and selecting for G4l8-resistant 
colonies. For experiments with FK506, cells were grown for three genera- 
tions to a density of 1 x 10' cells/ml in YAPD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described 11 . 
Where indicated, FK506 was added to a final concentration of 1 ug/ml 
0.5 h after inoculation of the culture or to 50 ug/ml 1 h before cells were 
collected. CsA was used at a final concentration of 50 ug/ml. Cells were 
broken by standard procedures" with the following modifications: Cell 
pellets were resuspended in breaking buffer (0.2 M Trrs HCI pH 7.6. 0.5 M 
NaCI. 10 mM EDTA. 1% SDS). vortexed for 2 min on a VWR multi-tube 
vonexer at setting 8 in the presence of 60% glass beads (425-600 |im 
mesh; Sigma) and phenol xhlorof or m (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and etna no I - 
precipitated. Poly A" RNA was isolated by two sequential 
chromatographic purifications over oligo dT cellulose (New England 
Biolabs. Beverly, Massachusetts) using established protocols". 

For experiments using 3-AT, wild-type or hi$3/his3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking histidine for 1 hr in the presence or absence of 10 mM 3- 



AT. as indicated. Cells were harvested and mRNA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle. 
Washington) and purified to homogeneity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center. Seattle. Washington). 
CsA was obtained from Alexis Biochemicals (San Diego, California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluorescently-la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed'. Cy3- or Cy5-dUTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 ul using Microcon-30 microconcen- 
trators (Amicon. Houston, Texas). Paired cDNAs were resuspended in 
20-26 pi hybridization solution (3 x SSC. 0.75 ug/ml polyA DNA. 0.2% 
SDS) and applied to the microarray under a 22- x 30-mm coverslip for 6 
h at 63 *C. all according to a published method'. 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics. Huntsvilte. Alabama) 
were used as templates with amino-modified forward primer and unmod- 
ified reverse primers to PCR amplify 6.065 ORFs from the 5. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from 100-ut reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 1 5 ul, and transferred to 384- 
well microtiter plates (Genetix Limited, Christchurch. Dorset. England). 
PCR products were spotted onto 1 x 3-inch polylysine- treated glass slides 
by a robot built essentially according to defined specifications"-' 
(http://cmgm.stanford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols 1 . 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah. Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil- 
ter, Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter. Chroma 570-620 nm emission fil- 
ter) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'Knitted' together in 
software to make the complete images. The intensity of spots (about 100 
urn) were quantified from the 10-um pixels by frame-by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1 .000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 
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ratio of genomic DNA spots', but is possibly more robust, as it is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation coefficients and their confidence limits. 
Correlation coefficients between the signature ORFs of various expert- 
ments were calculated using: 

P-Ix 4 y t /(Ix.'Iy k ') w 
k k k 

where k H the tog* of the expression ratio for the k* gene in the x signa- 
ture, and y. is the tog* of the expression ratio for the k" gene in the y sig. 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene s expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif. 
icantly affected but do show small random measurement errors. Selecting 
only the 95% confidence' genes for the correlation calculation rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations more apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error ban. and assuming uncorrected errors" They do 
not include the bias mentioned above; thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ± 0.1. for example, is very signifi- 
cantry different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys- 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cv3 
and Cy5 detection channels. 

The 1 pg/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p « -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p * 0 38 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p . 0.71 ± 0.04) but not with the signatures 
from the negative controls (mean p . -0.02 with a standard deviation of 
0.1 8). 



smaller spots hove fewer image pixels in the average. This does not de- 
grade accuracy noticeably until the number of pixels falls below ten. in 
wh.ch case the spot is rejected from the data set. -Wander' of spot DOd- 
twos w.th respect to the nominal grid is adaptive^ tracked in array sub- 
reg.ons by the image processing software. Unequal spot "wander- within 
a subregion greater than half-a-spot spacing b a difficulty for the auto- 
mated quantitating algorithms; in this case, the spot is rejected from 
anaiys.5 based on human inspection of the 'wander 1 . Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots typi- 
cally are rejected for these reasons. 
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Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5. and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure. such as gene-specific biases presumably due to differential incor- 
poration of Cy3. and Cy5-dUTP into cONA. were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 
Expression ratios are based on mean intensities over each spot. Some 
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co most* viral RNA wis obtained fay phenol and 
chloroform extractions of the vims and preciphated 
from ethanol CA-NC assembly reactions in the pres- 
ence of noncognatc RNA* were identical to those 
given in (9). In the absence of RNA, CA-NC cones 
formed under the following conditions: 300 pM CA- 
NC 1 M Nad and 50 mM tris-HO (pH 8D) at 37*C 
for 60 mm. m the absence of exogenous RNA. neither 
cones nor cylinders formed at concentrations of 0.5 
M NaO or below. Absorption spectra demonstrated 
that our CA-NC preparations were not contaminated 
with Escherichia co// RNA (estimated lower detection 
Umft was -1 base/protein molecule). To control for 
even lower levels of RNA contamination, we prem- 
cubated the CA-NC protein with 0.5 mg/rrri ribonu- 
dease A (Type 1-AS. 54 Kunin U/mg, Sigma) for 1 
hour at 4*C, which then formed cones normally. 

13. V. Y. Kiishko. data not shown. 

14. M. Ge and K. Sattler. Own. Phys. Lett 220, 192 
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The Transcriptional Program in 
the Response of Human 
Fibroblasts to Serum 

Vishwanath R. Iyer. Michael B. Eisen, Douglas T. Ross, 
Greg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, 
Louis M. Staudt, James Hudson Jr., Mark S. Boguski, 
Deval Lashkari, Dari Shalon, David Botstein, Patrick O. Brown* 

The temporal program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA miexoarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
patterns of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (/). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G 0 , characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to se- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to serum, using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
mary cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addition of medium containing 
10% FBS (2). DNA microarray hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (5) at 12 
times, ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 



Fig. 1. The same section of 
the microarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
croarray contained 9996 
elements, including 9804 
human cDNAs, represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 

Cy3-deoxyuridine triphosphate (dUTP). and mRNA harvested from cells at different times after serum 
st.mulat.on was used to prepare cDNA labeled with CyS-dUTP. The two cDNA pr^s were m!xe^aS 

mRNAj are more abundant ,n the serum-deprived fibroblasts (that is. suppressed by serum trfaS! 

SS^f" geneS ^° Se mRNAS arC m ° re 3bundam in te ^treated fSStiSS^TS 
spots YeUow spots represent genes whose expression does not vary substantially between the two 

^ T,^ thC 5p0tS <*P™^ the following genes: !. protein ^n^^ra^ 
related prote,n PS; 2, IL-8 precursor 3. EST AA057170; and 4. vascular endothelial growth S 
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culture (tone zero) labeled with a second fluo- 
rescent dye, Cy3 (4). The color images of the 
hybridization results (Fig. 1) were made by 
representing the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



veyed in this experiment (Fig. 2); many of these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) (5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent when the results were analyzed 
by a clustering and display method developed 
in our laboratory for analj-zing genome-wide 



Fig. 2. Ouster image 
showing the different 
dasses of gene agres- 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stimulation were 
selected (7). This sub- 
set of genes was dus- 
tered hierarchically into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
ef at. (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
zontal strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after serum 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the serum-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "duster" (in- 
dicated by the letters A 
to J and color coding). 
In every case examined, 
when a gene was rep- 
resented by more than 
one array element the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements clus- 
tered either adjacent 
or very dose to each 
other, pointing to the 
robustness of the dus- 
tering algorithm in 
grouping genes with 
very simitar patterns of 
express ioa 
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gene expression data (5). An example f such 
an analysis, here applied to a subset of 517 
genes whose expression changed substantially 
in response to serum (7), is shown in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
sciencemag.orgr / feature/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome- www^tanford. 
edu/serum) (8). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus effectively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microanay. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. 1 ). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fluori- 
genic quantitative polymerase chain reaction 
(PCR) assay (9). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3) (JO). 

The transcriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to serum stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vatcd protein (MAP) kinase phosphatase-l 
(MKP1)] were detectably induced within 
15 min after serum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximide {J J). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters I and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKPI, a dual-specificity phos- 
phatase that modulates the activity of the 
ERKI and ERK.2 MAP kinases (12). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKPI (Fig. 4 A) suggests the possibility 
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that continued activity f the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained expression 
(clusters I and J). The gene encoding a second 
member f the dua] -specif! city MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 67pyst2, was induced later, 
at about 4 hours after serum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine 1- 
phosphatc receptor (EDG-I), the vascular en- 
dothelial growth factor receptor, and the type II 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NETJ^plI5 rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogrammxng of the regulatory cir- 
cuits in response to serum involved not only 
induction of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators—some of which may play roles in 
maintaining the cells in G 0 or in priming 
them to react to wounding (Fig. 4Q. Perhaps 
as a consequence of the historical focus on 
genes induced by serum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl, p57 Kip2, and pI8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE 1 -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5 A), well 



before the onset of M phase at around 16 hours, 
raising the possibility of an additional role for 
Weel in an earlier stage of the cell cycle or in 
regulating the G 0 to G, transition. Several 
genes induced in the first few hours after serum 
stimulation, such as the helix-loop-helix pro- 
teins ID2 and ID3 and EST AA016305. a gene 
with homology to G r S cyclins, are candidates 
for roles in promoting the exit from G^ 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive pattern of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum treatment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin Bl, Cdc2. and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from G 2 to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cyclin-de- 
pendent kinase 7 (CDK7: a homolog of Xe- 
nopus M015 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents comple- 
tion of mitosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the scrum stimulus, one of 
the most striking features of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 



These included both genes involved in the di- 
rect role played by fibroblasts in remodeling of 
the clot and the extracellular matrix and, more 
notably, genes encoding proteins involved in 
intercellular signaling (Fig. 5). Genes rnduced 
m this program encode products that can (i) 
participate in the dynamic process of clotting, 
clot dissolution, and remodeling and perhaps 
contribute to hemostasis by promoting local 
vasoconstriction (for example, endothelin-1); 
(n) promote chemotaxis and activation of neu- 
trophils (for example, COX2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCP1); (Hi) promote 
chemotaxis and activation of T lymphocytes 
[for example, interleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-I), thus 
providing both innate and amigen-speciffc 
defenses against wound infection and recruit- 
ing the phagocytic cells that will be required 
to clear out the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example. CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keratinocytes, leading to reepithelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin- 1 
(Fig. 5E) (13). Conversely, expression of 
CALLA/CD10. a membrane metal loprotease 
that degrades endothelin- 1 and other peptide 
mediators of acute inflammation, was re- 
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urne zero, so that the results could be compared with those m^th* 

2Z H!! ay h ^ nd,Mtions ; 'n general quantitation with the two methods 
gave very similar results (70). 
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duced A second example is provided by a set 
of five genes inv Ived in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex- 
planati n for the coordinated down-regula- 
ti n of the cholesterol biosynthetic pathway 
is that serum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferation. Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few selected genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H\ encodes a secreted protein without a 
clearly identified function in human cells (14, 
IS), Its induction in serum-stimulated fibre- 




Immediate^ arty transcription factors 



tun 
turj 



C Other transcription factors 



Fold repression 

>8 >6 >4 >? 



Fold induction 

>? >4 >fi>R 



Fig. 4. "Reprogramming* of fibroblasts. Expres- 
sion profiles of genes whose function is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element in 
the microarray, all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 
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blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal panems during 
the response of fibroblasts to scrum. For exam- 
ple, 1 3 of the 40 genes in cluster D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 13 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, particularly in DNA replication and the 
G 2 -M transition. This enrichment for cell 
cycle-related genes suggests that some of the 



unnamcd genes in this cluster— for example, 
EST W7931I and EST R 13 146, neither of 
which have sequence similarity to previously 
characterized genes— may represent previously 
unknown genes involved in this part of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into cluster F on the basis of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2\ sug- 
gesting that a similar role should be considered 
for the many unnamed genes in this cluster. A 
disproportionately large fraction of the genes 
whose transcription diminished upon serum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 
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from G c to a proliferating state However, one 
f the defining characteristics of genome-scale 
expression profiling experiments is that the ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Serum, the soluble fraction of clot- 
ted blood, is normally encountered by cells in 
wo in the context of a wound. Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts are 
programmed to interpret the abrupt exposure to 
serum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound The proliferative response that we orig- 
inally intended to study appeared to be part of a 
larger physiological response of fibroblasts to a 
wound. Other features of the transcriptional 
response to serum suggest that the fibroblast is 
an active participant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to serum were 
examined. The subsequent signals from other 
cellular participants in the normal wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 
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Systematic variation in gene expression 
patterns in human cancer cell lines 

Douglas T.Ross 1 , Uwe Scher* Michae] B. Eisen', Charles M. Perou*. Christian W, Paul Spellman* 
V.shwanath .Iyer" Stefan* S Jeffrey*, Ma „ Van de Rijn<, Mark Wahham*, Alexander PergamenschiW, 

o ' Timothy G ' Myer$8 ' ,ohn N - Weinstei " 5 David Botstei " 2 

W used cDNA ^roarrays to explore the variation in expression of approximate* 8.000 unique genes among the 
60 c II hnes used ,n the Nafonal Cancer Institute's screen for anticancer drugs. C.assification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the t,ssue of ong.n allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
features of the gene expression patterns appeared to be related to physiological properties of the cell lines.Tuch 
as their doubling t,me ,n culture, drug metabolism or the interferon response. Comparison of gene expression pat- 
terns ,n the cell hnes to those observed in normal breast tissue or in breast tumour specimens revealed features of 
the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo 



Introduction 

Cell lines derived from human tumours have been extensively used 
as experimental models of neoplastic disease. Although such cell 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such eel] lines will continue to be used as experimental models for 
the foreseeable future. The National Cancer Institute s Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the NC160) derived from tumours 
from a variety of tissues and organs 1 -*. The DTP has assessed many 
molecular features of the cells related to cancer and chemothera- 
peutic sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeutics (http://dtp.nci.nih.gov). A 
previous analysis of these data revealed a connection between the 
pattern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
activity to have related methods of action 3 * 5-7 . 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human transcripts in these 
60 cell hnes. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenorype of the cell 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenorypic aspects of the cells and tissues studied 8 - 10 . 
Here we present an analysis of the observed patterns of gene 
expression and their relationship to phenorypic properties of the 
60 cell hnes. The accompanying report » explores the relationship 
between the gene expression patterns and the drug sensitivity pro- 
files measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of cell and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
m vivo, should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacterized genes 1 '- 1 *. The databases, plus tools for 
analysis and visualization of the data, are available (http://genome- 
www.stanford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically spotting 9,703 human 
cDNAs on glass microscope slides' 7 ' 18 . The cDNAs included 
approximately 8,000 different genes: approximately 3,700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homologues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3,000 cDNAs 
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from these experiments have been sequence-verified, includine 
all of those referred to here by name. 6 
Each hybridization compared Cy5-labelled cDNA reverse Iran- 
scribed from rnRNA isolated from one of the cell lines with Cy3- 
labeUed cDNA reverse transcribed from a reference rnRNA 
sample. This reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of rnRNA from 12 of 
the cell lines (chosen to maximize diversity in gene expression as 
determined primarily from two-dimensional gel studies 2 ) Bv 
comparing cDNA from each cell line with a common reference 
variation in gene expression across the 60 cell lines could be 
interred from the observed variation in the normalized Cy5/Cv3 
rauos across the hybridizations. 

To assess the contribution of artefactua] sources of variation in 
^J? Pe n men,al, y measured expression patterns. K562 and 
MCF7 cell lines were each grown in three independent cultures 
and the enure process was carried out independently on rnRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence ratio measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6,992 spots included for the MCF7 samples and 6 161 
spots for K562) ranged from 0.83 to 0.92 (for graphs and details 
sec http://genome-www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm'"" and a pseudo-colour visu- 



alizauon matrix"'. The object of the clustering was to group cdl 
Jnes with similar repertoires of expressed genes and to gVoup 
genes whose expression level varied among the 60 cell linefin a 

^hi. T nner- aUS,eri 1. g w P erf0 ™«' «*e using different 
subsets of genes to assess the robustness of the analysis. In one case 
(Fig. 1), we concentrated on those genes that showed the most 
variation m expression among the 60 cell lines ( 1 ,167 total) A sec- 
ond analyse (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6,831 spots). 



Gene expression patterns related to the histologic 
origins of the cell lines 9K 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
(Figs la and 2). Cell lines derived from £JL£ meEma 
central nervous system, colon, renal and ovarian tissue were clus- 
tered into .ndependent terminal branches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 
non-small lung carcinoma and breast tumours were distributed 
m multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous 

Many of these coherent cell line clusters were distinguished by 
the specific expression of characteristic groups of genes 
(Hg. 3o-rf). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3c) This set 
was enriched for genes with known roles in melanocyte bioloey 
including tyrosinase and dopachrome tautomerase (TYR and 

S^2? U S , ! , i!t 0f a " enZyme C ° mp,eX invo,ved in melani » 
synthesis-*), MARTI (MLANA; which is being investigated as a 

target tor immunotherapy of melanoma") and S100-P (S100B- 

which has been used as an antigenic marker in the diagnosis of 
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Ng. 2 Gene expression patterns related to 
other cell-line phenotypes. a, We applied 
two-dimensional hierarchical clustering to 
expression data from a set of 6,831 cONAs 
measured aaoss the 64 cell lines. The 6,831 
cONAs were those wfth a minimum fluores- 
cence signal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel in each of the six 
hybridization* used to establish reproducibil- 
ity. This effectively selected those spots that 
provided the most reliable ratio measure- 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the 60 cell lines was of moderate magnitude, 
o. Cluster-ordered data table, c, Doubling 
time of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their expression pat- 
terns) showed enrichment for sets of genes 
involved in distinct functional categories (for 
example, ribosomal genes versuj genes 
involved in pre-ftNA splicing). «, Gene cluster 
in which alt characterized and sequence-veri- 
fied cDNAs encode genes known to be regu- 
lated by interferons, f. Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in fig. 2 is the strong tendency for redun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, this property also demon- 
strates that these, and probably all, genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expect to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual genes from the other genes with 
identical expression patterns. 
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melanoma). LOXIMVI, the seventh line designated as melanoma 
in the NQ60, did not show this characteristic pattern. Although 
isolated fr ma patient with melanoma, LQXIMVJ has previously 
been noted t lack melanin and other markers useful for identifi- 
cati n of melanoma cells 1 . 

Paradoxically, two related cell lines (MDA-MB435 and MDA- 
N), which were derived from a single patient with breast cancer 
and have been conventionally regarded as breast cancer cell lines, 
shared expression of the genes associated with melanoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocarcinoma of the breast 24 - 25 , h remains 
possible that the origin of the cell line was a breast cancer, and that 
its gene expression pattern is related to the neuroendocrine fea- 
tures of some breast cancers 26 . But our results suggest that this ceil 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the cell-line tree— in which 
groups span cell lines from different tissue types— also reflected 
shared biological properties of the tissues from which the cell 
lines were derived. The carcinoma-derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal cells. A cluster of genes is shown (Fig 3b) 
that is most strongly expressed in cell lines derived from colon 
carcinomas, six of seven ovarian-derived cell lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes m this cluster have been implicated in several aspects of 
epithelial cell biology 27 . The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithelial cells, including those encoding components of 
adherens complexes (for example, desmoplakin (DSP) 
periplakin (PPL) and plakoglobin (JUP)), an epithelial- 
expressed cell-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exchanger 28 - 31 (SLC9A1). It also contained genes 
that encode putative transcriptional regulators of epithelial mor- 
phogenesis, a human homologue of a Drosophik melanogaster 
epithelial-expressed tumour suppressor (LLGLl) and a homeo- 
box gene thought to control calcium-mediated adherence in 
epithelial cells 32 -" (MSX2). In 
In contrast, a separate, major branch of the cell-line dendro- 
gram (Fig. ]«) included all glioblastoma -derived cell lines, all 
renal-cell-carcinoma-derived cell lines and the remaining carci- 
noma-derived lines. The characteristic set of genes expressed in 
this duster included many whose products are involved in stro- 
mal cell functions (Fig. 3d). Indeed, the two cell lines originally 
described as 'sarcoma-like' in appearance (Hs578T. breast carci- 
nosarcoma, and SF539, gliosarcoma) expressed most of these 
genes 34 - 35 . Although no single gene was uniformly characteristic 
of this duster, each cell line showed a distinctive pattern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the extracellular matrix (for example, caldesmon 
(CALD1), cathepsins, thrombospondin (THBS), Jysyl oxidase 
(LOX) and collagen subtypes). Although the ovarian and most 
non-small-cell-lung-derived carcinomas expressed genes charac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal cells were 
more abundantly represented in this gene set. 



processes; the ^nation » their expressi „ levels may reflect cor- 
responding differences in activity of these processes in the cefl 
lines. For example, a duster f 1,159 genes (Fig. 2a) included 
™ny whose products are necessary f r progression through the 
cell cycle (such as CCNA1, MCMI06 and MAD2L1), RNApro- 

J ,randi ! ti0n "^""y («ch as RNA helicases, 
ruiRNPs and translation elongation factors) and traditional 
pathologic markers used to identify proliferating cells (MKI67). 
WWun this large duster were smaller dusters enriched for genes 
with more speaalrzed roles. One duster was highly enriched for 
numerous nbosomal genes, whereas another was more enriched 
for genes encoding RNA-splicing factors. The variation in 
wpression of these nbosomal genes was significantly correlated 
with vanation in the cell doubling time (corrdation coeffident of 
0.54 , supporting the notion that the genes in this duster were 
S ceD iSes * P ' olifen,tion rate °' Srowth rate^r! 

In a smaller gene duster (Fig. 2d), all of the named genes were 
prev.ously known to be regulated by interferons"*. Additional 
groups of imerferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that the NCI60 cell lines 
exhibited variation in activity of interferon-response pathway,, 
which was reflected in gene expression patterns 3 * 

Another cluster (Fig. 2e) contained several genes encoding 
proteins with possible interrelated roles in drug metabolism! 
indudingglutamate-cysteineligase (GLCLC, the enzyme respon- 
a f miting M P «>f glutathione synthesis), thiore- 

doxir, (TXN) and thioredoxin reductase (TXNRD1; enzymes 
involved in regulating redox state in cells), and MRP1 (a drue 
transporter known ,o efficiently transport glutathione-conju- 
gated compounds"). The elevated expression of this set of genes 
in a subset of these cell lines may reflect selection for resistance to 
cnemotnerapeutics. 



Physiological variati n refl ct d 
in gene expr ssi n patterns 

A cluster diagram of 6,831 genes (Fig. 2) is useful for exploring 
clusters of genes whose variation in mRNA levels was not obvi- 
ously attributable to cell or tissue type. We identified some gene 
dusters that were enriched for genes involved in specific cellular 
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Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 
Like many other types of cancer, tumours of the breast typically 
have a complex histological organization, with connective tissue 
and leukocytic infiltrates interwoven with tumour cells. To 

2°" n P° $Sibai 7 tha ' miation in 6«* expression in the 
tumour cell lmes might provide a framework for interpreting the 
expression patterns in tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a sample of nor- 
mal breast t.ssue and the NC160 cell lines derived from breast 
^" Ce I! ( i x u clud , in S MDA-MB-435 and MDA-N) and leukaemias 
(Fig. 4). This clustering highlighted features of the gene expres- 
won , pattern shared between the cancer specimens and individual 
ceu lines derived from breast cancers and leukaemias 
The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19) 

l^l a AT n ° S \fr ? ,hW " epi,helia1, gfnes defined in tl " «>m. 
plete NCI60 cell line cluster, were expressed in both of the biopsy 

samples and the two breasi-derived cell lines, MCF-7 and T47D 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 
luminal epnhelial cells (Fig. 5*). Expression of a set of genes char- 
™, r i?, C ! ,r ° mal Cel ' 5, includin 8 c °"agen genes (COL3A1, 

fS™ C ? UA1) and Sm0 ° th musde ceU markers 
TAGLN), was a feature shared by the tumour sample and the 
stromal-hke cell lines Hs578T and BT549 (Fig. 5fr). This feature 
of the expressjon pattern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPM1-8226). notably including 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
nature genetics • volume 24 • march 2000 



2000 Nature America Inc. • http^/genetlcs.naturexom 






IJV »«T1 in NOMC& OCT 



leukaemia cluster (6 ESTs) 




"I?* 1 •"••■•'•ri*- 

DLClWi«0iWOCIM<I«iC 
ICtfMIMMCtMtl LMvii tOMCMOG 



""t>%tu«a».i»iCtf»TeM T*Pt t 



CJCOMPUMfkl 

°*^» »M won. 

co«ij rf tan%, 

i«*a ™ ""M - - | [ , tJL 11 , 



CUMm CUMXtM* 



ion* tcr* 



»m»»t I TUNC* MOYI-. „, 
"""'MCUlUUdMIOni 



epithelial cluster (15 ESTs) 




..Sl'lin.'iji' 



- IMllWCMtlMNCIMMiyni 

Q.IOTCR | 



**— t n mini rwpi 



'WOO S-«MuOtn(MCn. 



CTXJCACXIDi 

CUTl 1 CU1 

O*.0t CAlOf 
CAtOi OUCH 
CSMCOtOKt.! 
(VMI CI 

*M Ml CI » TO* * 

C»V1 CAvlcX*. 
C*V1C*vt(Xt< 

CfXLACtN t*ih » , 

» ™0»0»J»0*». M.P*M 
CTCf COH^CTWt r»M 

1 

»*K » OUl «0« »Qk IMU 

CA#«a C*4»Mr} 
" t*MAJCC* BMU 

••»« »*i NOMncei 



OU] 



melanoma cluster (16 ESTs) 



COl^U J COiLAGf » T>n v ufw » 

**>■« 

. '6>WI«. M , MlB , tMBlM|1(lcl 
» "<WQI *a »*ClO« »«• MO 

cocai ci ti o»aoNcvci( i<4«t i 



mesenchymal cluster (67 ESTs) 



Fig. 3 Gene clusters related to tissue characteristics in th#> r»ii r 
for gene, „p, e ,sed in «l. line, of ctemibly ,imiU, origin, . ilTT V, ^ " 9im> " ,h « tluS,er "'"S""" in Fi 9 « '"owing oene c.u,,., k . 
g«nn thai were «»p,e« e d in most leuk.emi.^ e ,i veo „ ' 9 9hly « the leukaemia^erived cell lin« Zl . n " Ched 

tions clmter together). 6. Cluster of gene, ^Z^^ilZ , 1 ho " ««'«ively in the eryrobla.toid ^11^0^1^^"' di,,in 9 ui » h 

«. of gene,w M .. l o m od.r..., y e.p^ ifrno, ^ 

cer-derived line,, c. Cluster of gene, highly e« D re„ed L 2!^L . ' non.«n»IRell-lun 9 (4/6) line,, but wa, e.o.e,«d ..^ . . P ,2/2) Thil 

MB435 and MDA-N). « Clu,te, of ge™, hfghTe,^ u "^" n Z * n °™-«""<> «"» (6/7) and ,„c related line, o, en^ de , JedT.o" L'" " "" " n "™ 

ver.«.ed by wquenting. The number o) ,equence-valid«.rt per. w ! W ' a " shown ""'y fo ' al1 k "0«n Bene, who,. id.n,i, ! 8m> mo,e 

adjacent ,i„ only .p^ 0 , intt ,e, thei, pos^on Tn ^ ,he clu »" in ££££ TZ Z'ol Z""^ 

.™ 9 e, W ,,h.,, 9 en.n,me,.nd,«e,,ionnu m ber,.,e^.^ 



nature genetics • volume 24 • march 2000 



231 



article 



Nature Ar "erica Inc. • http://genetlcs.nature.com 




leukocyte duster 



stromal cluster 



epithelial duster 



proWefaiion duster 




cell lines 



immunoglobulin antibodies; data not shown). Therefore dis- 
tinct sets of genes with co-varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct cell types that can be 
distinguished m breast cancer tissue. A fourth cluster of genes 
more highly expressed in all of the ceD lines than in any of the 
clinical specimens, was enriched for genes present in the 'prolif- 
eration' cluster described above (Fig. 5d). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cyding cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell lines that dominated the clustering were pervasive enough 
that similar relationships were revealed when alternative subsets 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour-derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene expression programs established during differentiation in 
vrvo. Nevertheless, the prominence of mesenchymal features in 
the cell lines isolated from glioblastomas and carcinomas may 
reflect a selection for the relative ease of establishment of cell 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions 38 " 40 . 



^il™^^ ^t! *"* Wprmi0fl *" "in" freest caocr 

specimens end cunured breast cancer end leukaemia cell line*. *. T»«JiZ~Z 
"on., hierarchy clustering .pp.ied to oene^£ e^£^££ 
cancer specmeru. a lymph node metastasis fronTone p**mJZn£\ 

data from t.ssue specimens was clustered along with expression daUtram. 

uT.: the H NC,6 ° c< ," ,ines to " Ptor * ^ ^^^^ 

terns observed .n spec.fk line, could be identified in the tissue s^TE£b 
<nd.cate gene clusters (shown in detail in fig. 5) that may b«^tateTteso!riS 
cellar components of the tumour spedmens! «nceT s^irn^ 

stoned with anti-keratin antibodies, showing the comple"mb TcSZ£ 
character«t t ca»ly found in breast tumour,. The a rrowsTSg hUgh? me d nS 
cellular components of this tissue specimen that were disXubtad bTSe 
gene expression cluster analysis (Fig. 5). <"™nguisned by the 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared attributes of 
known genes within the clusters. Uncharacterized cDNAs are 
likely to encode proteins that have roles similar to those of the 
known gene products with which they appear to be co-regulated. 
Still, for several clusters of genes, we were unable to discern a com- 
mon theme linking the identified members of the cluster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the NC160 cells may provide insight". The rehuonship of 
the gene expression patterns to the drug sensitivity patterns mea- 
sured by the DTP is an example of linking variation in gene 
expression with more subtle and diverse phenotypic variation" 

The patterns of gene expression measured in the NCI60 cell 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologically complex breast 
cancer specimens 41 . Although it is now feasible to analyse gene 
expression m micro-dissected tumour specimens* 2 ' 45 , this obser- 
vation suggests that it will be possible to explore and interpret 
some of the biology of clinical tumour samples by sampling them 
intact. As is useful in conventional morphological pathology, one 
might be able to observe interactions between a tumour and its 
microenvironrnent in this way. These relationships will be clari- 
fied by suitable analysis of gene expression patterns from intact as 
well as dissected tumours 12 * 14 ' 15 - 41 . 

Methods 

cDNA clones We obtained the 9,703 human cDNA clones (Research Genet- 
ics) used in these experiments as bacterial colonies in 96-well microtitre 
Plates . Approximately 8,000 distinct Unigene clusters (representing nomi- 
^ unique genes) were represented in this set of clones. All genes identi- 
ned here by name represent clones whose identities were confirmed by re- 
sequencing, or by the criteria that two or more independent cDNA clones 
ostens.bly representing the same gene had nearly identical gene expression 
patterns. A single-pass 3' sequence re-verincation was attempted for every 
clone after re-streaking for single colonies. For a subset of genes for which 
quality 3 sequence was not obtained, we attempted to confirm identities by 
5 s^ncmg Of the subset of clones selected for 5' sequence verification 
on the basis of an interesting pattern of expression (888 total), 33 J were cor 
rcct y identified 57, incorrecdy identified, and 500, indeterminate (poor 
quality sequence). We estimated that 1 5«*-20% of array elements contained 

< ™T"T ng L m ° rC th3n ° nC CionC pcr wcU - far - th < Entities of 
-3 000 clones have been verified. The full lis. of clones used and their nomi- 
nal ^entities are available (gene names preceded by the designation "SID*" 
(Stanford Identification) represent clones whose identities have not yet been 
verified; http://genome-www.stanford.edu:8000/nci60). 
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Production of cDNA microarrays. The arrays used in this experiment were 
produced at Synteni Inc. (now Incyte Pharmaceuticals). Each insert was 
amplified from a bacterial colony by sampling 1 u] of bacterial media and 
performing PCR amplification of the insert using consensus primers for 

rrr/^ , Sm i ^ PrCSCn,Cd in the clone Sft <5-TTCTAAAACCACG 
GCCACTG-3 . 5 -CACACACGAAACACCTATG-3'). Each PCR product 
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(100 was purified by gel exclusion, concentrated and resusDended in 
3XSSC (10 ul). He PGR product, we* then printed on SJ„ 
mwoscope slide* using a robot with four printing Upi DeuUed protocols 
for assembling and operating a microarray printer, and printing and exper- 
menu) application of DNA microarray, „e available (http^/cmem. 
slanford.edu/pbrown). P-»<mgm. 

fteparition of mRNA and reference pooL Cell lines were grown from NQ 
^^^^^■^ppkmmx^ with phenol red, glutamine 
(2 mM) and Mb feul calf serum. To nunimize the contribution of variations 
in culture conAuons or cell density to differentia] gene expression, we grew 
each ceDlme to 80% confluence and isolated mRNA 24 h after tZhZ 
^.""ium. The time between removal from the incubatorand lysis of the 
atoinRNARabflization buffer was minimiied«l min). CeD, were Iwedin 
buffer ^containing guanidiuro iwthiocvanate and total RNA was purified 
with the RNeasy purification kit (Qiagen). \* purified mRNA as neried 



using a poly(A) purificauon kit (OUgotex, Qiagen) according to the m.m> 

mtegntyand relative contamination of mRNA with ribosomal RNaT^ 

me breast tumours were surgically excised from patienu and „*iM. 
transported to the pathology laboratory, where ,3eX rntr^S 

use A frozen tumour specimen was removed from the freezer, cut into 
small p,tces (-50-100 mg each), immediately placed into lO^mTofTri 
zol reagent (Gibco-BRL) and homogenized ulg , K£ 12 W 
Homogemzer (Fuher Scientific), starting at 5.000 r.p.m and wad^S! 

roUtumour homogena.e » described in the Trizol pro c^Sl, « 
mmal step to remove f.t Once total RNA wa, obtain^ SSt,^ 
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We combined mRNA from the following cells in equal quantities to 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic myeloid leukaemia); NQ-H226 (non-small-celMung); COLO 
205 (colon); SNB-19 (central nervous system); LQX-IMV1 (melanoma)- 
OVCAR-3 and OVCAR-4 (ovarian); CAKM (renal); PC-3 (prostate); and 
MCF7 and Hs578T (breast). The criterion for selection of the ceD lines in 
the reference are described in detail in the accompanying manuscript « 

Doubling-time calculations. We calculated doubling times based on rou- 
tine NC360 ceD line compound screening data; and they reflect the dou- 
Wing times for cells inoculated into 96-weIl plates at the screening inocula- 
lion densities and grown in RPMI 1640 medium supplemented with 5% 
fetal bovine serum for 48 h. We measured ceD populations using sulforho- 
damine B optical density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = e kt , where No is optical density 
for control (untreated) cells at time zero, N is optical density for control cells 
after 48-h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given cell 
line, we obtained No and N values by averaging optical densities (N>6 000) 
obtained for each ceD line for a year's screening. Data and experimental details 
are available (http://drp.nci.nih.govj. 



at hnp^/rana stanford.edu/software). Each spot was defined by manual 
positioning of a grid of circles over the amy image. For each fluorescent 
image the average pixel intensity within each circle was determined, and a 
local background was computed for each spot equal to the median purl 
intensity m a square of 40 pixels in width and height centred on the spot 
centre, excluding all pixels within any denned spots. Net signal was deter, 
mined by subtraction of this local background from the average intensity 
for each spot. Spots deemed unsuitable for accurate quantitation because 
of array artefacts were manually flagged and excluded from further analy- 
sis. Data files generated by ScanAlyze were entered into a custom database 
that maintains web- accessible files. Signal intensities between the two fluo- 
rescent images were normalized by applying a uniform scale factor to all 
intensities measured for the Cy5 channel. The normalization factor was 
chosen so that the mean log(Cy3/Cy5) for a subset of spots that achieved a 
minimum quality parameter (approximately 6,000 spots) was 0. This efTec- 

hav^ r ^ T Si ? na ;;^™*- W «ghted average* spot on each array to 
nave a Cy3/Cy5 ratio of 1 .0. 7 



Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labelled cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript 11 reverse- tran- 
scription kit (Gibco-BRL). For each reverse transcription reaction. mRNA 
(2 Mg) was mixed with an anchored oligo-dT (d-20T-d(AGC)) primer (4 
Mg) m a total volume of 15 ul, heated to 70 "C for 10 min and cooled on ice 
To this sample, we added an unlabelled nucleotide pool (0 6 ul- 25 mM 
each dATP, dCTP, dCTP, and 15 mM dTTP). either Cy3 or Cy5 conjugated 
dUTP (3 uJ; 1 mM; Amersham), Sxfirst-strand buffer (6 Ml' 250 mM Tris 
HCL, pH 8.3, 375 mM KG, 15 mM Mgd 2 ), 0.1 M DTT (3 ul) and 2 uJ of 
Superscript II reverse transcriptase (200 u/ul). After a 2-h incubation at 42 
°C. the RNA was degraded by adding 1 N NaOH (1.5 ul) and incubating at 
70 °C for 10 min. The mixture was neutralized by adding of 1 N HCL ( 1 5 
Ml), and the volume brought to 500 Ml with TE (lOmM Tris, 1 mM EDTA) 
We added Cotl human DNA (20 M g; Gibco-BRL), and purified the probe 
by centnfugation in a Centricon-30 micro-concentrator (Amicon). The 
two separate probes were combined, brought to a volume of 500 Ml and 
concentrated again to a volume of less than 7 uJ. We added 10 ur/uJ 
poly(A) RNA (1 pi; Sigma) and tRNA (10 ug/uj; Gibco-BRL) were added 
and adjusted the volume to 9.5 Ml with distilled water. For final probe 
preparation, 20xSSC (2.1 1.5 M Nad, 150 mM NaCitrate, pH 8 0) and 
10% SDS (0.35 Ml) were added to a total final volume of 12 uL The probes 
were denatured by heating for 2 min at 100 'C, incubated at 37 °C for 
20-30 min, and placed on the array under a 22 mmx22 mm glass coverslip 
We incubated slides overnight at 65 °C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC. Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0 1 % SDS 
followed by lxSSC and then O.lxSSC The arrays were "spun dry" by cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 



Cluster analysis. We extracted tables (rows of genes, columns of individual 
microarray hybridizations) of normalized fluorescence ratio, from the data- 
base. Various selection criteria, discussed in relation to each data set, were 
apphed to select subsets of genes from the 9.703 cDNA elements on the 
arrays. Before clustering and display, the logarithm of the measured fluores- 
cence rauos for each gene were centred by subtracting thearithmetic mean of 
aUrauos measured for that gene. The centring makes all subsequent analyses 
independent of the amount of each gene's mRNA in the reference pool 

We appl.ed a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
lanty and average linkage clustering^'. The results of this process are 
two dendrograms (trees), one for the cell lines and one for the genes, in 
which very s.mdar elements are connected by short branches, and longer 
branches ,oin elements with diminishing degrees of similarity. For visual 
display the rows and columns in the initial data table were reordered to 
conform to the structures of the dendrograms obtained from the cluster 
analyse. Each eel) in the duster-ordered data table was replaced by a graded 
colour (pure red through black to pure green), representing the mean- 
adjusnrd rat.o value in the cell. Gene labels in cluster diagrams are dis- 
played here only for genes that were represented in the microarray by 
sequence-venfied cDNAs. A complete software implementation of this 
process is available (http://rana.stanford.edu/sofrware), as well as all clus- 
tering results (http://genome-www.stanford.edu/nci60). 



Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser-scanning microscope (ref. 17; httpj/cmgm 
stanford.edu/pbrown). Separate images were acquired for Cy3 and Cy5 We 
carried out data reduction with the program ScanAlyze (M.B.L, available 
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